视觉惯性探测器和猛击算法广泛用于各种领域,例如服务机器人,无人机和自动驾驶汽车。大多数SLAM算法都是基于地标是静态的。但是,在现实世界中,存在各种动态对象,它们会降低姿势估计精度。此外,暂时的静态对象,在观察过程中是静态的,但在视线视线时移动,触发假循环封闭。为了克服这些问题,我们提出了一个新颖的视觉惯性大满贯框架,称为dynavins,它对动态对象和暂时静态对象都具有强大的态度。在我们的框架中,我们首先提出一个可靠的捆绑捆绑调整,该调整可以通过利用IMU预融合估计的姿势先验来拒绝动态对象的功能。然后,提出了一个密钥帧分组和基于多种假设的约束分组方法,以减少循环闭合中暂时静态对象的效果。随后,我们在包含许多动态对象的公共数据集中评估了我们的方法。最后,通过成功拒绝动态和暂时静态对象的效果,我们的测力量与其他最先进方法相比,我们的测力素具有有希望的性能得到证实。我们的代码可在https://github.com/url-kaist/dynavins上找到。
translated by 谷歌翻译
我们介绍韩语了解评估(KLUE)基准。 Klue是8个韩国自然语言理解(nlu)任务的集合,包括主题分类,语言典的相似性,自然语言推断,命名实体识别,关系提取,依赖解析,机器阅读理解和对话状态跟踪。我们从各种源语料库中展开的所有任务,同时尊重版权,以确保任何没有任何限制的人的可访问性。考虑到道德考虑,我们仔细设计了注释协议。随着基准任务和数据,我们为每个任务提供适用的评估指标和微调配方,为每项任务进行预训练语言模型。我们还释放了预用的语言模型(PLM),Klue-Bert和Klue-Roberta,以帮助在KLUE上再现基线模型,从而促进未来的研究。我们通过拟议的Klue基准套件从初步实验中进行了一些有趣的观察,已经证明了这款新的基准套件的有用性。首先,我们找到了klue-roberta-mantring的其他基线,包括多语种plms和现有的开源韩国plms。其次,即使我们从预先预测语料库中取代个人身份信息,我们也会看到性能下降最小,这表明隐私和NLU能力并不彼此可能。最后,我们发现,使用BPE标记与语素级预象的组合,在涉及语素级标记,检测和发电的任务中是有效的。除了加速韩国人NLP研究外,我们的创建Klue的全面文件将有助于将来为其他语言创建类似的资源。 klue在https://klue-benchmark.com上提供。
translated by 谷歌翻译
假新闻,虚假或误导性信息作为新闻,对社会的许多方面产生了重大影响,例如在政治或医疗域名。由于假新闻的欺骗性,仅将自然语言处理(NLP)技术应用于新闻内容不足。多级社会上下文信息(新闻出版商和社交媒体的参与者)和用户参与的时间信息是假新闻检测中的重要信息。然而,正确使用此信息,介绍了三个慢性困难:1)多级社会上下文信息很难在没有信息丢失的情况下使用,2)难以使用时间信息以及多级社会上下文信息,3 )具有多级社会背景和时间信息的新闻表示难以以端到端的方式学习。为了克服所有三个困难,我们提出了一种新颖的假新闻检测框架,杂扫描。我们使用元路径在不损失的情况下提取有意义的多级社会上下文信息。 COMA-PATO,建议连接两个节点类型的复合关系,以捕获异构图中的语义。然后,我们提出了元路径实例编码和聚合方法,以捕获用户参与的时间信息,并生成新闻代表端到端。根据我们的实验,杂扫不断的性能改善了最先进的假新闻检测方法。
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.
translated by 谷歌翻译
Dynamic treatment regimes assign personalized treatments to patients sequentially over time based on their baseline information and time-varying covariates. In mobile health applications, these covariates are typically collected at different frequencies over a long time horizon. In this paper, we propose a deep spectral Q-learning algorithm, which integrates principal component analysis (PCA) with deep Q-learning to handle the mixed frequency data. In theory, we prove that the mean return under the estimated optimal policy converges to that under the optimal one and establish its rate of convergence. The usefulness of our proposal is further illustrated via simulations and an application to a diabetes dataset.
translated by 谷歌翻译
Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query. All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning. However, we argue that these methods have overlooked two indispensable issues: 1) Boundary-bias: The annotated target segment generally refers to two specific frames as corresponding start and end timestamps. The video downsampling process may lose these two frames and take the adjacent irrelevant frames as new boundaries. 2) Reasoning-bias: Such incorrect new boundary frames also lead to the reasoning bias during frame-query interaction, reducing the generalization ability of model. To alleviate above limitations, in this paper, we propose a novel Siamese Sampling and Reasoning Network (SSRN) for TSG, which introduces a siamese sampling mechanism to generate additional contextual frames to enrich and refine the new boundaries. Specifically, a reasoning strategy is developed to learn the inter-relationship among these frames and generate soft labels on boundaries for more accurate frame-query reasoning. Such mechanism is also able to supplement the absent consecutive visual semantics to the sampled sparse frames for fine-grained activity understanding. Extensive experiments demonstrate the effectiveness of SSRN on three challenging datasets.
translated by 谷歌翻译
Human parsing aims to partition humans in image or video into multiple pixel-level semantic parts. In the last decade, it has gained significantly increased interest in the computer vision community and has been utilized in a broad range of practical applications, from security monitoring, to social media, to visual special effects, just to name a few. Although deep learning-based human parsing solutions have made remarkable achievements, many important concepts, existing challenges, and potential research directions are still confusing. In this survey, we comprehensively review three core sub-tasks: single human parsing, multiple human parsing, and video human parsing, by introducing their respective task settings, background concepts, relevant problems and applications, representative literature, and datasets. We also present quantitative performance comparisons of the reviewed methods on benchmark datasets. Additionally, to promote sustainable development of the community, we put forward a transformer-based human parsing framework, providing a high-performance baseline for follow-up research through universal, concise, and extensible solutions. Finally, we point out a set of under-investigated open issues in this field and suggest new directions for future study. We also provide a regularly updated project page, to continuously track recent developments in this fast-advancing field: https://github.com/soeaver/awesome-human-parsing.
translated by 谷歌翻译
A storyboard is a roadmap for video creation which consists of shot-by-shot images to visualize key plots in a text synopsis. Creating video storyboards however remains challenging which not only requires association between high-level texts and images, but also demands for long-term reasoning to make transitions smooth across shots. In this paper, we propose a new task called Text synopsis to Video Storyboard (TeViS) which aims to retrieve an ordered sequence of images to visualize the text synopsis. We construct a MovieNet-TeViS benchmark based on the public MovieNet dataset. It contains 10K text synopses each paired with keyframes that are manually selected from corresponding movies by considering both relevance and cinematic coherence. We also present an encoder-decoder baseline for the task. The model uses a pretrained vision-and-language model to improve high-level text-image matching. To improve coherence in long-term shots, we further propose to pre-train the decoder on large-scale movie frames without text. Experimental results demonstrate that our proposed model significantly outperforms other models to create text-relevant and coherent storyboards. Nevertheless, there is still a large gap compared to human performance suggesting room for promising future work.
translated by 谷歌翻译
In the new era of personalization, learning the heterogeneous treatment effect (HTE) becomes an inevitable trend with numerous applications. Yet, most existing HTE estimation methods focus on independently and identically distributed observations and cannot handle the non-stationarity and temporal dependency in the common panel data setting. The treatment evaluators developed for panel data, on the other hand, typically ignore the individualized information. To fill the gap, in this paper, we initialize the study of HTE estimation in panel data. Under different assumptions for HTE identifiability, we propose the corresponding heterogeneous one-side and two-side synthetic learner, namely H1SL and H2SL, by leveraging the state-of-the-art HTE estimator for non-panel data and generalizing the synthetic control method that allows flexible data generating process. We establish the convergence rates of the proposed estimators. The superior performance of the proposed methods over existing ones is demonstrated by extensive numerical studies.
translated by 谷歌翻译